Going Beyond Word Cooccurrences in Global Lexical Selection for Statistical Machine Translation using a Multilayer Perceptron

نویسندگان

  • Alexandre Patry
  • Philippe Langlais
چکیده

Phrase-based statistical machine translation (PBSMT) decoders translate source sentences one phrase at a time using strong independence assumptions over the source phrases. Translation table scores are typically independent of context, language model scores depend on a few words surrounding the target phrase and distortion models do not influence directly the choice of target phrases. In this work, we propose to condition the selection of each target word on the whole source sentence using a multilayer perceptron (MLP). Our interest in MLP lies in their hidden layer which encodes source sentences in a representation that is not directly tied to the notion of word. We evaluated our approach on an English to French translation task. Our MLP model was able to improve BLEU scores over a standard PBSMT system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Words in Statistical Machine Translation using a Multilayer Perceptron

We propose to estimate the probability that a target word appears in the translation of a given source sentence using a multilayer perceptron. At the expense of ignoring word order and repetition, our model does not assume word alignments and consider all source words jointly when evaluating the probability of a target word. We compared our model against IBM1 which does not consider word order ...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Using Statistical Word Associations for the Retrieval of Strongly-Textual Cases

Lexical relationships allow a textual CBR system to establish case similarity beyond the exact correspondence of words. In this paper, we explore statistical models to insert associations between problems and solutions in the retrieval process. We study two types of models: word cooccurrences and translation alignments. These approaches offer the potential to capture relationships arising betwe...

متن کامل

Confidence estimation for translation prediction

The purpose of this work is to investigate the use of machine learning approaches for confidence estimation within a statistical machine translation application. Specifically, we attempt to learn probabilities of correctness for various model predictions, based on the native probabilites (i.e. the probabilites given by the original model) and on features of the current context. Our experiments ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011